Search Results: "Robert Collins"

16 June 2008

Robert Collins: 17 Jun 2008

Launchpad, please stop mailing me mine own comments on bugs. I know what I said. kthxbye

14 June 2008

Rethinking annotate: I was recently reminded of Bonsai for querying vcs history. GNOME runs a bonsai instance. This got me thinking about 'bzr annotate', and more generally about the problem of figuring out code. It seems to me that 'bzr annotate', is, like all annotates I've seen pretty poor at really understanding how things came to be - you have to annotate several versions, cross reference revision history and so on. 'bzr gannotate' is helpful, but still not awesome. I wondered whether searching might be a better metaphor for getting some sort of handle on what is going on. Of course, we don't have a fast enough search for bzr to make this plausible. So I wrote one: bzr-search in my hobby time (my work time is entirely devoted to landing shallow-branches for bzr, which will make a huge difference to pushing new branches to hosting sites like Launchpad). bzr-search is alpha quality at the moment (though there are no bugs that I'm aware of). Its mainly missing optimisation, features and capabilities that would be useful, like meaningful phrase searching/stemming/optional case insensitivity on individual searches. That said, I've tried it on some fairly big projects - like my copy of python here:

time bzr search socket inet_pton
(about 30 hits, first one up in 1 second)...
real    0m2.957s
user    0m2.768s
sys     0m0.180s

The index run takes some time (as you might expect, though like I noted - it hasn't been optimised as such). Once indexed, a branch will be kept up to date automatically on push/pull/commit operations. I realise search is a long slope to get good results on, but hey - I'm not trying to compete with Google :). I wanted something that had the following key characteristics: * Worked when offline * Simple to use * Easy to install Which I've achieved - I'm extremely happy with this plugin. Whats really cool though, is that other developers have picked it up and already integrated it into loggerhead and bzr-eclipse. I don't have a screen shot for loggerhead yet, but heres an old one. This old one does not show the path of a hit, nor the content summaries, which current bzr-search versions create.

9 June 2008

Robert Collins: 10 Jun 2008

Recently I read about a cool bugfix for gdb in the Novell bugtracker on planet.gnome.org. I ported the fix to the ubuntu gdb package, and Martin Pitt promptly extended it to have an amd64 fix as well. I thought I would provide the enhanced patch back to the Novell bugtracker. This required creating new Novell login as my old CNE details are so far back I can't remember them at all. However, hard-stop when I saw this at the bottom of the form: "By completing this form, I am giving Novell and/or Novell's partners permission to contact me regarding Novell products and services." No thank you, I don't want to be contacted. WTF.

8 June 2008

Robert Collins: 8 Jun 2008

So, the last lazyweb question I asked had good results. Time to try again: Whats a good python-accessible, cross-platform-and-trivially-installable(windows users) flexible (we have plain text, structured data, etc and a back-end storage area which is only accessible via the bzr VFS in the general case), fast (upwards of 10^6 documents ), text index system? pylucene fails the trivially installable test (apt-cache search lucence -> no python bindings), and the bindings are reputed to be SWIG:(, xapian might be a candidate, though I have a suspicion that SWIG is there as well from the reading I have done so far, and - we'll have to implement our own BackEndManager subclass back into python. That might be tricky - my experience with python bindings is folk tend to think of trivial consumers only, not of python providing core parts of the system :(. So I'm hoping there is a Better Answer just lurking out there... Updates: sphinx looks possible, but about the same as xapian - it will need a custom storage backend. google desktop is out (apart from anything else, there is no way to change the location documents are stored, nor any indication of a python api to control what is indexed). It looks like I need to be considerably more clear :). I'm looking for something to index historical bzr content, such that indices can be reused in a broad manner(e.g. index a branch on your webserver), are specific to a branch/repository (so you don't get hits for e.g. the working tree of a branch), with a programmatic API (so that the bzr client can manage all of this), with no requirement for a daemon (low barrier to entry/non-admin users).

4 June 2008

Robert Collins: 4 Jun 2008

So I've been playing with Mnemosyne recently, using it to help brush up on my woeful Latin vocabulary. I thought it would be a good idea to get some of that data out of my head an into Ubuntu (which has a Latin translation). Imagine my surpise when, after installing the latin language pack (through the gui), I could not log into Ubuntu in Latin?! It turns out that there is no Latin locale in Ubuntu, or indeed in glibc. This is kind of strange (there is an esperanto locale). Remember that locales combine language and location - they describe how to format money, numbers, telephone details and so on. So clearly, I needed to add a latin locale. I could add one for just me (e.g. la_AU), or I could add a generic one (helpfully using AU values) on the betting chance that at this point there are not enough folk wishing to log in in latin (after all you can't!) for us to need one per country. And even more so, doing la_AU doesn't make a lot of sense - there isn't a pt_AU locale even though there are portuguese speakers living in Australia. (The root issue here is that location and language are conflated. POSIX I hate thee). So, a quick crash course in locales, some copy and paste later, and there is a Latin locale. Installing that on my system got me a latin locale, but gdm still wouldn't let me select it. It turns out that gdm feels the urge to maintain its own list of what locales exist, and what to call them. I thought duplication in software was a bad idea, but perhaps I don't understand the problem space enough. Anyhow, time to fix it. And because this is something other people may be interested in, and the patches are not yet in Ubuntu because upstream glibc may choose a different locale code (e.g. la_AU), I've finally had reason to activate my ppa on launchpad, so there are now binary packages for hardy for anyone that wants to play with this!

23 May 2008

Robert Collins: 23 May 2008

This week I've been at UDS in Prague, and looking at some possible ways to deploy bzr for packaging (which is a hot topic: developers don't want to change workflows without a concrete benefit, and definitely don't want to pay a cost for doing so - e.g. having to have all of history locally just to make a trivial change). One of the discussions inspired a scalability test for bzr - not how we think we'd deploy bzr for Ubuntu developers, just a test to understand how it would scale *if* we did it this way. Lars Wirzenius has a habit of testing VCS systems capabilities in various ways, including importing the Debian/Ubuntu source archive into them. He kindly ran a test using bzr, creating a single shared repository, with one branch in it per source packages. This took a few hours to generate (I'm not sure of the exact figure, we forgot to time it, but it was started in the afternoon and finished in the morning). The resulting repository has 21GB in its .bzr/repository/packs directory, and 500MB in its .bzr/repository/indices directory. There are 30 pack files, the largest of which is 16GB, and the smallest a few hundred kB. In general VCS terms this repository has 16000 heads, 16000 commits (because we didn't import deep archive history). But what about performance? Its currently copying to a machine where I can do some serious benchmarks using this repository. I do have some quick and dirty figures though. To branch a single package (libyanfs-java) from its branch within the repository to a new standalone branch with cold cache took ~5 seconds. Branching again from the repository now the needed data is in page cache took 0.6 seconds. Branching from the newly created branch to another new standalone branch took 0.3 seconds. There is a clear slowdown occuring here. Including startup costs the time to make a new branch is doubled by adding the branch to the repository. However as the repository is 16000 times the size, the scaling factor (2/16000) is pretty darn good. I'm stoked at this result, as I think it demonstrates just what the underlying pack store is capable of. We are working on streamlining the upper layers of bzr to make better and better use of the underlying store. For instance, John Meinel has just done this for 'bzr missing' and 'bzr uncommit'. Now I must go, time for breakfast! Woo!

8 March 2008

Robert Collins: 8 Mar 2008

Best Breakfast place if you are in london: Roast Yum. (Also most expensive I suspect).

26 February 2008

Robert Collins: 26 Feb 2008

I'm very happy to announce that Canonical are hosting a Squid meetup in London this coming Saturday and Sunday the 1st and 2nd of March. Any developers (in the broad sense - folk doing coding/testing/documenting/community support/) are very welcome to attend. As it is a weekend and a security office building, you need to contact me to arrange to come - just rocking up won't work :). We'll be there all Saturday and Sunday through to mid-afternoon. The Canonical London office is in Millbank Tower http://en.wikipedia.org/wiki/Millbank_Tower. So if you want to come by please drop me a mail. We'll be getting very technical very quickly I expect - for folk wanting a purely social meetup, I'm going to pick a reasonable place to meet for food and (optionally) alcohol on Saturday evening - I'll post details here mid-friday.

16 February 2008

Robert Collins: 17 Feb 2008

Just an observation on the user interface of mobile phone chargers. My phone runs flat all the time. And it's all UI. The phone manual sayeth: "Do not leave the phone on the charger once it is charged; doing so will reduce battery life." The phone gives no signal when its charged. So I have to stand over the phone when its charging, to ensure I unplug it appropriately. As a result: I don't charge it when I'm in a rush; and it never gets charged when I'm about to do something else - I will forget it. Blech. How much can a 'disconnect when charged' circuit really cost?

10 February 2008

Robert Collins: 11 Feb 2008

For simpler tracing of python code than my snippet...: python -mtrace -t program.py

23 January 2008

Robert Collins: 23 Jan 2008

Tracing python programs. Today, Evan Dandrea asked a general question "Where is set -x for python". A quick google for sys.settrace found: Some code snippets. I thought this was nice, but surely you want to be able to just trace an arbitrary program. So I present a 'quick hack' (5 minutes precisely :)) to do that based on the previous links final version:

#!/usr/bin/env python
 import linecache
import os
import os.path
import sys
 def traceit(frame, event, arg):
    if event == "line":
        lineno = frame.f_lineno
        filename = frame.f_globals["__file__"]
        if (filename.endswith(".pyc") or
            filename.endswith(".pyo")):
            filename = filename[:-1]
        name = frame.f_globals["__name__"]
        line = linecache.getline(filename, lineno)
        print "%s:%s: %s" % (name, lineno, line.rstrip())
    return traceit
 def main():
    search_path = os.environ.get('PATH',
        '').split(os.path.pathsep)
    argv = sys.argv[1:]
    if not argv:
        raise Exception("No command to trace supplied")
    args = argv[1:]
    command = argv[0]
    if os.path.sep not in command:
        for path in search_path:
            if os.path.exists(os.path.join(path, command)):
                command = os.path.join(path, command)
                break
    del sys.argv[0]
    source = open(command, 'rt')
    exec_symbols = dict(globals())
    exec_symbols['__name__'] = '__main__'
    sys.settrace(traceit)
    exec source in exec_symbols, exec_symbols
 main()

22 January 2008

Robert Collins: 22 Jan 2008

I send mail from my laptop by a local smarthost-with-auth install of exim4. Recently I got motivated to setup smtp submission port for this, as I got tired of borked hotel wifi intercepting smtp, and was behind a firewall that allowed no smtp out... It was pretty simple - on my mail server, enable listening on port 587 by putting: 'daemon_smtp_ports = smtp : 587' in before the local_interfaces line. And on my laptop, edit the 'remote_smtp_smarthost' stanza to add 'port = 587'. Yay to less mail headaches.

10 January 2008

Robert Collins: 11 Jan 2008

thank you lazyweb; A number of folk have written to me pointing out Netem. One in particular, Yusuf Goolamabbas even provided a set of wrapper scripts for Netem that I'm going to be digging into next week. Netem is built around various lower level tools like tc, which is good (tc is what I was using year ago). I'm hopeful it will be really easy to use, and will blog something when I've used it in anger :)

9 January 2008

Robert Collins: 10 Jan 2008

Dear lazyweb, In bzr development we are now working primarily on network performance. One of the key things about being sure we have improved things is automated, repeatable benchmarks. And for that to be useful in networking environments we need to control latency and bandwidth and packet loss. I know this isn't a new problem, but it was about 5 years ago that I last did this sort of thing. What are the best tools today (for linux :)). Ideally I'd be able to bring up a bunch of local addresses like 127.0.1.1 or 127.0.0.2, with different properties - such that traffic from 127.0.0.1 to 127.0.0.2 will simulate being uploaded over adsl, and 127.0.0.2 to 127.0.0.1 will simulate being downloaded over adsl.

26 October 2007

Robert Collins: 26 Oct 2007

Flying to the US for UDS in Boston... and this time no dreaded AAAA's. Seems the US doesn't hate me quite so much, much more pleasant this time. Still, being fingerprinted for entering a country is rather irritating, back home we only do that to criminals.

15 August 2007

Robert Collins: 15 Aug 2007

When are two identical changes the same, and when aren't they? Theres a little bit of debate started by Andrew Cowie posting about unmixing the paint. Matt Palmer followed up with a claim that a particular technique used by Andrew is dangerous, and finally Andrew Bennetts makes the point that text conflicts are a small subset of merge conflicts. That said, one critical task for a version control system is the merge command. Lets define merge at a human level as "reproduce the changes made in branch A in my branch B". There are a lot of taste choices that can be made without breaking this definition. For instance, merge that combines all the individual changes into one - losing the individual commit deltas meets this description. So does a merge which requires all text conflicts to be resolved during the merge commands execution, or one that does not give a human a chance to review the merged tree before recording it as a commit. So if the goal of merge is to reproduce these other changes, then we are essentially trying to infer what the *change* was. For example, in an ideal world, merging a branch that changes all "log messages of floating points to 6 digit scale." would know enough to catch all new log messages added in my branch, regardless of language, actual api used etc etc. But that is fantasy at the moment. The best we can do today depends on how we capture the change. For instance, Darcs allows some changes to be captured as symbol changing patches, and others as regular textual diffs. So the problem about whether arriving at the same result can be rephrased 'when is arriving at the same result correct or incorrect'. For instance, if I write a patch and put it up as plain text on a website, then two people developing $foo download it and apply it, they have duplicate changes but its clearly correct that a merge between them should not error on this. On the other hand, the example Andrew Bennetts quotes in his post is a valid example of two people making the same change, but the line needing a change during the merge to remain correct. Here's another, example though. If I commit something faulty to my branch, and you pull from me before I fix it. Then while I fix the bug, you also fix it - the same way. That is another example of no-conflict being correct. If its possible for either answer - conflict, or do not conflict - to be correct, then what should a VCS author do? There are several choices here:

Always conflict
Never conflict conflict
Conflict based on a heuristic

I think that our job is to assess what the maximum harm from choosing the wrong default is, and the likely hood of that occuring, and then make a choice. Short of fantasy no merge is, in general, definately good or bad - your QA process (such as an automatic test suite) needs to run regardless of the VCS's logic. The risk of a bad merge is relatively low, because you should be testing, and if the merge is wrong you can just not commit it, or roll it back. So our job in merge is to make it likely as possible that your test suite will pass when you have done the merge, without further human work. This is very different to trying to always conflict whenever we cannot be 100% sure that the text is what a human would have created. Its actually harder to take this approach than conflicting - conflicting is easy.

14 July 2007

Robert Collins: 14 Jul 2007

So we're here in sunny Vilnius sprinting on bzr. I thought I'd write up some of what we've achieved. On Thursday Wouter got most of the nested-trees branch merged up with bzr.dev, but about 500 tests failing ;). Jelmer has introduced a new parameter to the sprout api call on BzrDir called 'limit_to_revisions' which if supplied is a set of revisions which forms a strict limit on the amount of data to be fetched. This is intended to be the API by which the desire for limiting history - to achieve shallow branches/history horizons - is communicated at the 'bzr branch' or 'bzr checkout' stage. This API is tested so everything passes, but nothing chooses to implement it yet. I was hacking on commit (somewhat indirectly) by working on the repository-wide indexing logic we will require if we wish to move to a more free-form database with semi-arbitrary keys mapping to opaque data (at the lowest level). I got a core GraphIndex up at this point. Once complete this should drop the amount of I/O we perform during commit by half. There was some other hacking going on too - Tim Hatch did some tweaks to bzr-hg, Jelmer some bzr-svn, and so on. On Friday we carried on on the same basic targets: Wouter fixed the 500+ failing tests so now its only in the 5-10 range, and has been debugging them one by one since. Jelmer implemented the limit_to_revisions parameter all the way down to the knit level, for non-delta compressed knits (e.g. revisions.knit), which made knit1 repositories support the parameter and got the 'supports' branches of the interface tests being executed. I developed CombinedGraphIndex, and have sent the low level Index stuff up for review, and followed that up by implementing a KnitIndex implementation that layers on the GraphIndex API to allow the use of .knit files without .kndx. This new index API allows us to switch out the indexing code and format much more easily than previously - the knit index logic is decoupled from the generic indexing facilities. It also allows us to record arbitrary-parent compression chains. Finally Jelmer implemented the wishlist item for bzr-gtk of a notification area widget that can be used to enable bzr-avahi watching, and share commits with the local LAN/get notified of commits on the local LAN. Today is the last day of the sprint, and I hope that we'll get the nested tree branch passing all tests, the limit_to_revisions parameter causing delta-compressed knits to only copy data outside the listed parameters to the first full text, and have an experimental repository format that uses the new index layer for at least a couple of its knits (e.g. revisions and inventory, or revisions and signatures).

7 July 2007

Robert Collins: 7 Jul 2007

Ran into an interesting thing a couple of days ago. It looks like mercurial suffers from a race condition wherein a size-preserving change made within the same second as the last hg command's completion (or possibly the last time hg observed the file) will be ignored by hg. This isn't particularly bad for regular human users (as long as you don't have your editor open while running commands in the background), but its pretty harsh for scripting users - size preserving changes are not *that* uncommon. I'm immensely glad that we don't have this race within bzr (even on FAT filesystems where you only get last-modified to within 2 seconds!)

5 July 2007

Robert Collins: 5 Jul 2007

Just spent some time bringing bzr-avahi up to play nice with current bzr. This gives it integration with bzr-dbus (and thus bzr lan-notify) and bzr commit-notify (in the bzr-gtk plugin)

3 July 2007

Robert Collins: 3 Jul 2007

I want to know when we will get interesting talks like this happening! Competing on the basis of speed.

Next.

Previous.